AITopics | bert 0

Collaborating Authors

bert 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Moral Gap of Large Language Models

Skorski, Maciej, Landowska, Alina

arXiv.org Artificial IntelligenceJul-25-2025

MFT has found numerous aplications, including analysis of political ideology (Graham et al., 2009), environmental attitudes (Fein-berg and Willer, 2013), vaccine hesitancy (Amin et al., 2017), social "Everyone deserves equal access to healthcare regardless of income" Fairness "Respect your elders and follow traditional "Stand with our troops - they sacrifice everything for our freedom" Loyalty "Marriage is sacred and should be protected The advent of deep learning and particularly transformer architectures marked a significant advancement in moral content analysis. Hoover et al. (2020) first applied deep learning models to moral The recent proposal of applying LLMs to moral content categorization (Bulla et al., 2025) showed promise but suffered from methodological limitations.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.13140/RG.2.2.26221.70880

2507.18523

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (0.54)
Health & Medicine > Therapeutic Area > Immunology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Explaining How Visual, Textual and Multimodal Encoders Share Concepts

Cornet, Clément, Besançon, Romaric, Borgne, Hervé Le

arXiv.org Artificial IntelligenceJul-25-2025

Sparse autoencoders (SAEs) have emerged as a powerful technique for extracting human-interpretable features from neural networks activations. Previous works compared different models based on SAE-derived features but those comparisons have been restricted to models within the same modality. We propose a novel indicator allowing quantitative comparison of models across SAE features, and use it to conduct a comparative study of visual, textual and multimodal encoders. We also propose to quantify the Comparative Sharedness of individual features between different classes of models. With these two new tools, we conduct several studies on 21 encoders of the three types, with two significantly different sizes, and considering generalist and domain specific datasets. The results allow to revisit previous studies at the light of encoders trained in a multimodal context and to quantify to which extent all these models share some representations or features. They also suggest that visual features that are specific to VLMs among vision encoders are shared with text encoders, highlighting the impact of text pretraining. The code is available at https://github.com/CEA-LIST/SAEshareConcepts

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.18512

Country:

Europe (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Transferrable Surrogates in Expressive Neural Architecture Search Spaces

Qin, Shiwen, Kadlecová, Gabriela, Pilát, Martin, Cohen, Shay B., Neruda, Roman, Crowley, Elliot J., Lukasik, Jovita, Ericsson, Linus

arXiv.org Artificial IntelligenceJul-4-2025

Neural architecture search (NAS) faces a challenge in balancing the exploration of expressive, broad search spaces that enable architectural innovation with the need for efficient evaluation of architectures to effectively search such spaces. We investigate surrogate model training for improving search in highly expressive NAS search spaces based on context-free grammars. We show that i) surrogate models trained either using zero-cost-proxy metrics and neural graph features (GRAF) or by fine-tuning an off-the-shelf LM have high predictive power for the performance of architectures both within and across datasets, ii) these surrogates can be used to filter out bad architectures when searching on novel datasets, thereby significantly speeding up search and achieving better final performances, and iii) the surrogates can be further used directly as the search objective for huge speed-ups.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.12971

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Injecting Bias into Text Classification Models using Backdoor Attacks

Yavuz, A. Dilara, Gursoy, M. Emre

arXiv.org Artificial IntelligenceDec-25-2024

The rapid growth of natural language processing (NLP) and pre-trained language models have enabled accurate text classification in a variety of settings. However, text classification models are susceptible to backdoor attacks, where an attacker embeds a trigger into the victim model to make the model predict attacker-desired labels in targeted scenarios. In this paper, we propose to utilize backdoor attacks for a new purpose: bias injection. We develop a backdoor attack in which a subset of the training dataset is poisoned to associate strong male actors with negative sentiment. We execute our attack on two popular text classification datasets (IMDb and SST) and seven different models ranging from traditional Doc2Vec-based models to LSTM networks and modern transformer-based BERT and RoBERTa models. Our results show that the reduction in backdoored models' benign classification accuracy is limited, implying that our attacks remain stealthy, whereas the models successfully learn to associate strong male actors with negative sentiment (100% attack success rate with >= 3% poison rate). Attacks on BERT and RoBERTa are particularly more stealthy and effective, demonstrating an increased risk of using modern and larger models. We also measure the generalizability of our bias injection by proposing two metrics: (i) U-BBSR which uses previously unseen words when measuring attack success, and (ii) P-BBSR which measures attack success using paraphrased test samples. U-BBSR and P-BBSR results show that the bias injected by our attack can go beyond memorizing a trigger phrase.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.18975

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English

Kathunia, Aekansh, Kaif, Mohammad, Arora, Nalin, Narotam, N

arXiv.org Artificial IntelligenceMay-5-2024

People communicate in more than 7,000 languages around the world, with around 780 languages spoken in India alone. Despite this linguistic diversity, research on Sentiment Analysis has predominantly focused on English text data, resulting in a disproportionate availability of sentiment resources for English. This paper examines the performance of transformer models in Sentiment Analysis tasks across multilingual datasets and text that has undergone machine translation. By comparing the effectiveness of these models in different linguistic contexts, we gain insights into their performance variations and potential implications for sentiment analysis across diverse languages. We also discuss the shortcomings and potential for future work towards the end.

bert 0, machine translation, sentiment analysis, (11 more...)

arXiv.org Artificial Intelligence

2405.02887

Country:

Asia > India (0.24)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Tōhoku (0.05)
(6 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU

Madureira, Brielen, Schlangen, David

arXiv.org Artificial IntelligenceMar-28-2024

While humans process language incrementally, the best language encoders currently used in NLP do not. Both bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full, to be processed either forwards and backwards (BiLSTMs) or as a whole (Transformers). We investigate how they behave under incremental interfaces, when partial output must be provided based on partial input seen up to a certain time step, which may happen in interactive systems. We test five models on various NLU datasets and compare their performance using three incremental evaluation metrics. The results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality. The "omni-directional" BERT model, which achieves better non-incremental performance, is impacted more by the incremental access. This can be alleviated by adapting the training regime (truncated training), or the testing procedure, by delaying the output until some right context is available or by incorporating hypothetical right contexts generated by a language model like GPT-2.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2010.0533

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Using Domain Knowledge to Guide Dialog Structure Induction via Neural Probabilistic Soft Logic

Pryor, Connor, Yuan, Quan, Liu, Jeremiah, Kazemi, Mehran, Ramachandran, Deepak, Bedrax-Weiss, Tania, Getoor, Lise

arXiv.org Artificial IntelligenceMar-26-2024

Dialog Structure Induction (DSI) is the task of inferring the latent dialog structure (i.e., a set of dialog states and their temporal transitions) of a given goal-oriented dialog. It is a critical component for modern dialog system design and discourse analysis. Existing DSI approaches are often purely data-driven, deploy models that infer latent states without access to domain knowledge, underperform when the training corpus is limited/noisy, or have difficulty when test dialogs exhibit distributional shifts from the training domain. This work explores a neural-symbolic approach as a potential solution to these problems. We introduce Neural Probabilistic Soft Logic Dialogue Structure Induction (NEUPSL DSI), a principled approach that injects symbolic knowledge into the latent space of a generative neural model. We conduct a thorough empirical investigation on the effect of NEUPSL DSI learning on hidden representation quality, few-shot learning, and out-of-domain generalization performance. Over three dialog structure induction datasets and across unsupervised and semi-supervised settings for standard and cross-domain generalization, the injection of symbolic knowledge using NEUPSL DSI provides a consistent boost in performance over the canonical baselines.

constraint, structure induction, utterance, (13 more...)

arXiv.org Artificial Intelligence

2403.17853

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
(2 more...)

Add feedback

A State-Vector Framework for Dataset Effects

Sahak, Esmat, Zhu, Zining, Rudzicz, Frank

arXiv.org Artificial IntelligenceOct-16-2023

The impressive success of recent deep neural network (DNN)-based systems is significantly influenced by the high-quality datasets used in training. However, the effects of the datasets, especially how they interact with each other, remain underexplored. We propose a state-vector framework to enable rigorous studies in this direction. This framework uses idealized probing test results as the bases of a vector space. This framework allows us to quantify the effects of both standalone and interacting datasets. We show that the significant effects of some commonly-used language understanding datasets are characteristic and are concentrated on a few linguistic dimensions. Additionally, we observe some ``spill-over'' effects: the datasets could impact the models along dimensions that may seem unrelated to the intended tasks. Our state-vector framework paves the way for a systematic understanding of the dataset effects, a crucial component in responsible and robust model development.

dataset, dimension, interaction effect, (13 more...)

arXiv.org Artificial Intelligence

2310.10955

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(9 more...)

Genre: Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

PerturbScore: Connecting Discrete and Continuous Perturbations in NLP

Li, Linyang, Ren, Ke, Shao, Yunfan, Wang, Pengyu, Qiu, Xipeng

arXiv.org Artificial IntelligenceOct-13-2023

With the rapid development of neural network applications in NLP, model robustness problem is gaining more attention. Different from computer vision, the discrete nature of texts makes it more challenging to explore robustness in NLP. Therefore, in this paper, we aim to connect discrete perturbations with continuous perturbations, therefore we can use such connections as a bridge to help understand discrete perturbations in NLP models. Specifically, we first explore how to connect and measure the correlation between discrete perturbations and continuous perturbations. Then we design a regression task as a PerturbScore to learn the correlation automatically. Through experimental results, we find that we can build a connection between discrete and continuous perturbations and use the proposed PerturbScore to learn such correlation, surpassing previous methods used in discrete perturbation measuring. Further, the proposed PerturbScore can be well generalized to different datasets, perturbation methods, indicating that we can use it as a powerful tool to study model robustness in NLP.

continuous perturbation, correlation, perturbation, (15 more...)

arXiv.org Artificial Intelligence

2310.08889

Country:

North America > Canada (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hierarchical Pretraining for Biomedical Term Embeddings

Cai, Bryan, Zeng, Sihang, Lin, Yucong, Yuan, Zheng, Zhou, Doudou, Tian, Lu

arXiv.org Artificial IntelligenceJul-1-2023

Electronic health records (EHR) contain narrative notes that provide extensive details on the medical condition and management of patients. Natural language processing (NLP) of clinical notes can use observed frequencies of clinical terms as predictive features for downstream applications such as clinical decision making and patient trajectory prediction. However, due to the vast number of highly similar and related clinical concepts, a more effective modeling strategy is to represent clinical terms as semantic embeddings via representation learning and use the low dimensional embeddings as feature vectors for predictive modeling. To achieve efficient representation, fine-tuning pretrained language models with biomedical knowledge graphs may generate better embeddings for biomedical terms than those from standard language models alone. These embeddings can effectively discriminate synonymous pairs of from those that are unrelated. However, they often fail to capture different degrees of similarity or relatedness for concepts that are hierarchical in nature. To overcome this limitation, we propose HiPrBERT, a novel biomedical term representation model trained on additionally complied data that contains hierarchical structures for various biomedical terms. We modify an existing contrastive loss function to extract information from these hierarchies. Our numerical experiments demonstrate that HiPrBERT effectively learns the pair-wise distance from hierarchical information, resulting in a substantially more informative embeddings for further biomedical applications

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.00266

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.83)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.32)

Add feedback